Introduction to Neural Nets - part 2

by Arthur Ed LeBouthillier

This article was published in the August 1999 issue of Ther Robot Builder.

This is a follow-up to the article "Introduction to Neural Nets - part 1".

WARNING - The attorney general has concluded that this article contains mathematics that may be hazardous to your health.

In our last installment (The Robot Builder, July 1999), we introduced the basic neural net model and showed how it might be implemented in a program. In this article we will review a common learning algorithm that would modify the weights in the program provided last month.

How do we teach a neural net?

We know that the intelligence in a neural net is embodied in the connections between cells
and the weights between them. In the multi-layer perceptron (MLP), the cells are connected in layers so that the input layer feeds into a hidden layer and the hidden layer feeds into the output layer. Since the MLP defines the connection structure, we are left with only determining the weights between the cells of one layer and the next.

Once the proper weights are determined, we no longer need the training algorithm and we
use the program described last month to use the neural net for a useful purpose. So how do we do it? We will modify the weights of the neurons so that those neurons that properly determine the output are enhanced and those that generate the wrong output will be inhibited in accordance with the degree of the change they produce in the result. Our job will entail telling the neural net, through the learning algorithm, what the proper output should be for the given input; we must do this for all of the desired training samples. After modifying the network appropriately, the network will then be trained to provide the proper output for each of the provided inputs.

If we concentrated on what the output should be and then, going backwards towards the input, modified  each layer’s weights so that they produced the proper answer with the training sample, then we would have the correct answer for one training sample. We only need to do this for
all samples until the difference between the desired output and the actual output is small. This is the basis of the Backpropagation learning algorithm.

The Backpropagation Algorithm

The backpropagation learning algorithm works by propagating the error in the output layer backwards towards the hidden layer and then modifies the hidden layer weights so that they reflect more accurately what the output should be for its input. The weights are only modified by the amount of change they produce in the output. It repeats this process for all test samples until the network is trained. There are complicated mathematical explanations for how this works, but we won’t cover that. Suffice it to say that the algorithm treats the n inputs of the input layer as an n-dimensional surface and then heads “down hill” on that surface until the output error is minimized. It continuously modifies the weights until the total output error for the training set is less than a given threshold.

Repeat
   total_error = 0
   For each training-sample input, x
      1) generate the actual output of the neural  net, y,  for this
          training sample, x, with the current weights.
      2) generate the desired output of the neural  net, t.
      3) Compute the output layer change
         a) total_error = total_error + absolute(output_layer_change)
      4) Modify the output layer weights
      5) Compute the hidden layer change
         a) total_error = total_error + absolute(hidden_layer_change)
      6) Modify the hidden layer weights next training-sample
while total_error > threshold

Figure 1 - the backpropagation learning algorithm


As shown in figure 1, the algorithm is simple on its surface. The problem is that each of the steps to determine the adjustment and modifying the weights gets a little bit complicated. We’ll review each of these steps. Prior to training, you must pre-initialize all weights to small random values. This assures that the network doesn’t get lost in a never-ending loop. Computing the Output Layer Change The change for any output layer cell, j, is computed by the formula:

for j = 0 to number_of_output_layer_cells
    output_delta[j] = y[j] * (1-y[j]) * (t[j] - y[j])
    total_error = total_error + abs(output_delta(j))
next j

Figure 2 - Computing the output layer error

This routine says that the amount of modification, output_delta, for output layer cell, j, is equal to the actual output times one minus the actual output times the difference between the desired output and the actual output. Additionally, we accumulate the amount of change for each cell in the total_error variable.

Computing the Output Layer Weight Change

Now that we know the amount of change that is required for each output cell, we go ahead and
modify the weights for each cell accordingly. The formula in figure 3 shows how this is done.
What this says is that for each output layer cell, we modify the weight between that cell, j , and the
hidden layer cell, i. We do that by multiplying the output of the hidden layer times the amount of change needed for the output layer, output_delta(j), and add that to the current value of the weight, OutputWeight[i,j]. The value Nu is a learning rate constant and determines how quickly the algorithm learns; you will see it again in modifying the hidden layer weights. It is usually a value like 0.8 in a range of 0 to 1 indicating that weights should be modified at 80 percent of the error.

for i = 0 to number_of_hidden_layer_cells
   for j = 0 to number_of_output_layer_cells
      OutputWeight[i, j] = OutputWeight[i, j] + Nu * output_delta( j)  * HiddenLayer(i)
   next j
next i

Figure 3 - Changing the Output layer weights

Computing the Hidden Layer Change

Computing the weight change for the hidden layer is
a little more difficult since we first have to figure out how a given hidden layer cell effects the output layer and only modify it by that much. The value, sum1, is used to figure out the amount of
modification for a given hidden layer cell by determining its “efficacy” or amount of effect on the
output layer.

For j = 0 To number_of_hidden_layer_cells
   sum1 = 0
   For m = 0 To number_of_output_layer_cells
      sum1 = sum1 + output_delta(m) *  OutputWeight(j, m)
   Next m
   hidden_delta(j) = HiddenLayer(j) * (1 - HiddenLayer(j))  * sum1
   total_error = total_error + abs(hidden_delta(j))
Next j

Figure 4 - Determining the Hidden Layer Change

As figure 4 shows, we go through each of the hidden layer cells and figure out its efficacy on each cell of the output layer. We then use that to determine the amount of change, hidden_delta(j), for each of the hidden layer cells. Again, we sum up the amount of change for each cell in the value, total_error, to determine when we are close to a solution.

Computing the Hidden Layer Weight Change

The last step in the algorithm is to modify the hidden layer weights. This is done identically to the
method used for the output layer weights. Figure 5 shows how this is done.

For i = 0 To number_of_input_layer_cells
   For j = 0 To number_of_hidden_layer_cells
      HiddenWeight[i, j] = HiddenWeight[i, j] + Nu * hidden_delta[j] * InputLayer[I]
   Next j
Next i

Figure 5 - Modifying the hidden layer weights

Again, we modify each weight to be what it previously was plus the learning constant, Nu, times
the amount of change required for the hidden layer cell times the input layer value. An implementation of the algorithm The author used the above algorithm in a small
neural net which was able to learn the binary value of the images of the numerals 0 through 9. Each numeral was input into a 35 cell input layer organized in a 5 x 7 matrix. The input layer mapped into 10 hidden layer cells and then the hidden layer cells mapped to 4 output layer cells. By inputting a given numeric character into the 5 x 7 input matrix, the network produced the four-bit binary representation at the output cells. The network was trained on all 10 numerals, 0 - 9. It  took about 1 minute for the error to settle below the threshold for any training session. After training, the network reliably recognized characters even in the presence of upwards of 15 percent noise. In some cases, the input numeral was unidentifiable by the author but the network correctly identified it. Figure 6 illustrates the structure of the network in this application and figure 7 shows the complete learning algorithm used.

‘ Backpropagation Learning algorithm
Nu = 0.8
Threshold = 0.1
Repeat
   total_error = 0
   for sample = 0 to number_of_training_samples
      ‘ generate the correct answer for this sample
       t = generate_t(sample)
      ‘ Determine actual answer of network with current weights
       y = generate_y(sample)
      ‘ Determine output layer change
      for j = 0 to number_of_output_layer_cells
         output_delta[j] = y[j] * (1-y[j]) * (t[j] - y[j])
         total_error = total_error + abs(output_delta(j))
      next j
      ‘ Modify output layer weights
      for i = 0 to number_of_hidden_layer_cells
         for j = 0 to number_of_output_layer_cells
            OutputWeight[i, j] = OutputWeight[i, j] + Nu * output_delta( j)  *
HiddenLayer(i)
         next j
      next I
      ‘ Determine Hidden layer change
      For j = 0 To number_of_hidden_layer_cells
         sum1 = 0
         For  m = 0 To number_of_output_layer_cells
            sum1 = sum1 + output_delta(m) *  OutputWeight(j, m)
         Next m
         hidden_delta(j) = HiddenLayer(j) * (1 - HiddenLayer(j))  * sum1
         total_error = total_error + abs(hidden_delta(j))
      Next j
      ‘ Modify hidden layer weights
      For i = 0 To number_of_input_layer_cells
          For j = 0 To number_of_hidden_layer_cells
             HiddenWeight[i, j] = HiddenWeight[i, j]
+ Nu * hidden_delta[j] * InputLayer[I]
         Next j
      Next I
   Next sample
while total_error > threshold

Figure 7 - A complete backpropagation training algorithm